AITopics | interactive perception

Collaborating Authors

interactive perception

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ArtReg: Visuo-Tactile based Pose Tracking and Manipulation of Unseen Articulated Objects

Murali, Prajval Kumar, Kaboli, Mohsen

arXiv.org Artificial IntelligenceNov-11-2025

Robots operating in real-world environments frequently encounter unknown objects with complex structures and articulated components, such as doors, drawers, cabinets, and tools. The ability to perceive, track, and manipulate these objects without prior knowledge of their geometry or kinematic properties remains a fundamental challenge in robotics. In this work, we present a novel method for visuo-tactile-based tracking of unseen objects (single, multiple, or articulated) during robotic interaction without assuming any prior knowledge regarding object shape or dynamics. Our novel pose tracking approach termed ArtReg (stands for Articulated Registration) integrates visuo-tactile point clouds in an unscented Kalman Filter formulation in the SE(3) Lie Group for point cloud registration. ArtReg is used to detect possible articulated joints in objects using purposeful manipulation maneuvers such as pushing or hold-pulling with a two-robot team. Furthermore, we leverage ArtReg to develop a closed-loop controller for goal-driven manipulation of articulated objects to move the object into the desired pose configuration. We have extensively evaluated our approach on various types of unknown objects through real robot experiments. We also demonstrate the robustness of our method by evaluating objects with varying center of mass, low-light conditions, and with challenging visual backgrounds. Furthermore, we benchmarked our approach on a standard dataset of articulated objects and demonstrated improved performance in terms of pose accuracy compared to state-of-the-art methods. Our experiments indicate that robust and accurate pose tracking leveraging visuo-tactile information enables robots to perceive and interact with unseen complex articulated objects (with revolute or prismatic joints).

artificial intelligence, point cloud, robot, (15 more...)

arXiv.org Artificial Intelligence

2511.06378

Genre: Research Report > Promising Solution (1.00)

Industry:

Leisure & Entertainment (0.48)
Energy (0.35)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

Add feedback

RoboRetriever: Single-Camera Robot Object Retrieval via Active and Interactive Perception with Dynamic Scene Graph

Wang, Hecheng, Ren, Jiankun, Yu, Jia, Qi, Lizhe, Sun, Yunquan

arXiv.org Artificial IntelligenceAug-19-2025

Humans effortlessly retrieve objects in cluttered, partially observable environments by combining visual reasoning, active viewpoint adjustment, and physical interaction-with only a single pair of eyes. In contrast, most existing robotic systems rely on carefully positioned fixed or multi-camera setups with complete scene visibility, which limits adaptability and incurs high hardware costs. We present \textbf{RoboRetriever}, a novel framework for real-world object retrieval that operates using only a \textbf{single} wrist-mounted RGB-D camera and free-form natural language instructions. RoboRetriever grounds visual observations to build and update a \textbf{dynamic hierarchical scene graph} that encodes object semantics, geometry, and inter-object relations over time. The supervisor module reasons over this memory and task instruction to infer the target object and coordinate an integrated action module combining \textbf{active perception}, \textbf{interactive perception}, and \textbf{manipulation}. To enable task-aware scene-grounded active perception, we introduce a novel visual prompting scheme that leverages large reasoning vision-language models to determine 6-DoF camera poses aligned with the semantic task goal and geometry scene context. We evaluate RoboRetriever on diverse real-world object retrieval tasks, including scenarios with human intervention, demonstrating strong adaptability and robustness in cluttered scenes with only one RGB-D camera.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2508.12916

Country:

Europe (0.69)
North America > United States (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking

Wang, Xi, Chen, Tianxing, Yu, Qiaojun, Xu, Tianling, Chen, Zanxin, Fu, Yiting, Lu, Cewu, Mu, Yao, Luo, Ping

arXiv.org Artificial IntelligenceSep-24-2024

Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds. Our method leverages any interactive perception technique as a foundation for interactive perception, inducing slight object movement to generate point cloud frames of the evolving dynamic scene. These point clouds are then segmented using Segment Anything Model 2 (SAM2), after which the moving part of the object is masked for accurate motion online axis estimation, guiding subsequent robotic actions. Our approach significantly enhances the precision and efficiency of manipulation tasks involving articulated objects. Experiments in simulated environments demonstrate that our method outperforms baseline approaches, especially in tasks that demand precise axis-based control. Project Page: https://hytidel.github.io/video-tracking-for-axis-estimation/.

estimation, manipulation, point cloud, (13 more...)

arXiv.org Artificial Intelligence

2409.16287

Country:

Asia > China > Hong Kong (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.98)

Add feedback

Interactive Perception for Deformable Object Manipulation

Weng, Zehang, Zhou, Peng, Yin, Hang, Kravberg, Alexander, Varava, Anastasiia, Navarro-Alarcon, David, Kragic, Danica

arXiv.org Artificial IntelligenceJun-11-2024

Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to significant manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.

dav, manipulation, perception, (16 more...)

arXiv.org Artificial Intelligence

2403.05177

Country:

Asia > China > Hong Kong (0.04)
Europe > Sweden (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Media (0.30)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Enhancing Deformable Object Manipulation By Using Interactive Perception and Assistive Tools

Zhou, Peng

arXiv.org Artificial IntelligenceNov-16-2023

In the field of robotic manipulation, the proficiency of deformable object manipulation lags behind human capabilities due to the inherent characteristics of deformable objects. These objects have infinite degrees of freedom, resulting in non-trivial perception and state estimation, and complex dynamics, complicating the prediction of future configurations. Although recent research has focused on deformable object manipulation, most approaches rely on static vision and simple manipulation techniques, limiting the performance level. This paper proposes two solutions to enhance the performance: interactive perception and the use of assistive tools. The first solution posits that optimal perspectives exist during deformable object manipulation, facilitating easier state estimation. By exploring the action-perception regularity, interactive perception facilitates better manipulation and perception. The second solution advocates for the use of assistive tools, a hallmark of human intelligence, to improve manipulation performance. For instance, a folding board can aid in garment folding tasks by reducing object deformation and managing complex dynamics. Hence, this research aims to address the deformable object manipulation problem by incorporating interactive perception and assistive tools to augment manipulation performance.

deformable object manipulation, manipulation, perception, (12 more...)

arXiv.org Artificial Intelligence

2311.09659

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Bagging by Learning to Singulate Layers Using Interactive Perception

Chen, Lawrence Yunliang, Shi, Baiyu, Lin, Roy, Seita, Daniel, Ahmad, Ayah, Cheng, Richard, Kollar, Thomas, Held, David, Goldberg, Ken

arXiv.org Artificial IntelligenceSep-1-2023

Many fabric handling and 2D deformable material tasks in homes and industry require singulating layers of material such as opening a bag or arranging garments for sewing. In contrast to methods requiring specialized sensing or end effectors, we use only visual observations with ordinary parallel jaw grippers. We propose SLIP: Singulating Layers using Interactive Perception, and apply SLIP to the task of autonomous bagging. We develop SLIP-Bagging, a bagging algorithm that manipulates a plastic or fabric bag from an unstructured state, and uses SLIP to grasp the top layer of the bag to open it for object insertion. In physical experiments, a YuMi robot achieves a success rate of 67% to 81% across bags of a variety of materials, shapes, and sizes, significantly improving in success rate and generality over prior work. Experiments also suggest that SLIP can be applied to tasks such as singulating layers of folded cloth and garments. Supplementary material is available at https://sites.google.com/view/slip-bagging/.

plastic bag, robot, single layer, (14 more...)

arXiv.org Artificial Intelligence

2303.16898

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

Add feedback

Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening

Schorp, Vincent, Panitch, Will, Shivakumar, Kaushik, Viswanath, Vainavi, Kerr, Justin, Avigal, Yahav, Fer, Danyal M, Ott, Lionel, Goldberg, Ken

arXiv.org Artificial IntelligenceJul-13-2023

Accurate 3D sensing of suturing thread is a challenging problem in automated surgical suturing because of the high state-space complexity, thinness and deformability of the thread, and possibility of occlusion by the grippers and tissue. In this work we present a method for tracking surgical thread in 3D which is robust to occlusions and complex thread configurations, and apply it to autonomously perform the surgical suture "tail-shortening" task: pulling thread through tissue until a desired "tail" length remains exposed. The method utilizes a learned 2D surgical thread detection network to segment suturing thread in RGB images. It then identifies the thread path in 2D and reconstructs the thread in 3D as a NURBS spline by triangulating the detections from two stereo cameras. Once a 3D thread model is initialized, the method tracks the thread across subsequent frames. Experiments suggest the method achieves a 1.33 pixel average reprojection error on challenging single-frame 3D thread reconstructions, and an 0.84 pixel average reprojection error on two tracking sequences. On the tail-shortening task, it accomplishes a 90% success rate across 20 trials. Supplemental materials are available at https://sites.google.com/berkeley.edu/autolab-surgical-thread/ .

artificial intelligence, machine learning, spline, (19 more...)

arXiv.org Artificial Intelligence

2307.06845

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)

Add feedback

SGTM 2.0: Autonomously Untangling Long Cables using Interactive Perception

Shivakumar, Kaushik, Viswanath, Vainavi, Gu, Anrui, Avigal, Yahav, Kerr, Justin, Ichnowski, Jeffrey, Cheng, Richard, Kollar, Thomas, Goldberg, Ken

arXiv.org Artificial IntelligenceSep-27-2022

Cables are commonplace in homes, hospitals, and industrial warehouses and are prone to tangling. This paper extends prior work on autonomously untangling long cables by introducing novel uncertainty quantification metrics and actions that interact with the cable to reduce perception uncertainty. We present Sliding and Grasping for Tangle Manipulation 2.0 (SGTM 2.0), a system that autonomously untangles cables approximately 3 meters in length with a bilateral robot using estimates of uncertainty at each step to inform actions. By interactively reducing uncertainty, Sliding and Grasping for Tangle Manipulation 2.0 (SGTM 2.0) reduces the number of state-resetting moves it must take, significantly speeding up run-time. Experiments suggest that SGTM 2.0 can achieve 83% untangling success on cables with 1 or 2 overhand and figure-8 knots, and 70% termination detection success across these configurations, outperforming SGTM 1.0 by 43% in untangling accuracy and 200% in full rollout speed. Supplementary material, visualizations, and videos can be found at sites.google.com/view/sgtm2.

artificial intelligence, machine learning, sgtm 2, (18 more...)

arXiv.org Artificial Intelligence

2209.13706

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Interactive Perception at Toyota Research Institute

RobohubApr-21-2022, 15:28:54 GMT

Dr. Carolyn Matl, Research Scientist at Toyota Research Institute, explains why Interactive Perception and soft tactile sensors are critical for manipulating challenging objects such as liquids, grains, and dough. She also dives into "StRETcH" a Soft to Resistive Elastic Tactile Hand, a variable stiffness soft tactile end-effector, presented by her research group. Carolyn Matl is a research scientist at the Toyota Research Institute, where she works on robotic perception and manipulation with the Mobile Manipulation Team. She received her B.S.E in Electrical Engineering from Princeton University in 2016, and her Ph.D. in Electrical Engineering and Computer Sciences at the University of California, Berkeley in 2021. At Berkeley, she was awarded the NSF Graduate Research Fellowship and was advised by Ruzena Bajcsy. Her dissertation work focused on developing and leveraging non-traditional sensors for robotic manipulation of complicated objects and substances like liquids and doughs. Would you mind introducing yourself? Thank you so much for having me on the podcast. I'm Carolyn Matl and I'm a research scientist at the Toyota research Institute where I work with a really great group of people on the mobile manipulation team on fun and challenging robotic perception and manipulation problems.

interactive perception, sensor, tactile sensor, (13 more...)

Robohub

Country: North America > United States > California > Alameda County > Berkeley (0.24)

Genre: Personal (0.46)

Industry: Automobiles & Trucks > Manufacturer (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

SAGCI-System: Towards Sample-Efficient, Generalizable, Compositional, and Incremental Robot Learning

Lv, Jun, Yu, Qiaojun, Shao, Lin, Liu, Wenhai, Xu, Wenqiang, Lu, Cewu

arXiv.org Artificial IntelligenceNov-29-2021

Building general-purpose robots to perform an enormous amount of tasks in a large variety of environments at the human level is notoriously complicated. It requires the robot learning to be sample-efficient, generalizable, compositional, and incremental. In this work, we introduce a systematic learning framework called SAGCI-system towards achieving these above four requirements. Our system first takes the raw point clouds gathered by the camera mounted on the robot's wrist as the inputs and produces initial modeling of the surrounding environment represented as a URDF. Our system adopts a learning-augmented differentiable simulation that loads the URDF. The robot then utilizes the interactive perception to interact with the environments to online verify and modify the URDF. Leveraging the simulation, we propose a new model-based RL algorithm combining object-centric and robot-centric approaches to efficiently produce policies to accomplish manipulation tasks. We apply our system to perform articulated object manipulation, both in the simulation and the real world. Extensive experiments demonstrate the effectiveness of our proposed learning framework. Supplemental materials and videos are available on https://sites.google.com/view/egci.

neural network, robot, simulation, (15 more...)

arXiv.org Artificial Intelligence

2111.14693

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback